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Local Variation as a Statistical Hypothesis Test 


Michael Baltaxe • Peter Meer • Michael Lindenbaum 


Abstract The goal of image oversegmentation is to di¬ 
vide an image into several pieces, each of which should 
ideally be part of an object. One of the simplest and yet 
most effective oversegmentation algorithms is known 
as local variation (LV) ( [Felzenszwalb and Huttenlocher 


|2QQ4 ). In this work, we study this algorithm and show 
that algorithms similar to LV can be devised by ap¬ 
plying different statistical models and decisions, thus 
providing further theoretical justification and a well- 
founded explanation for the unexpected high perfor¬ 
mance of the LV approach. Some of these algorithms 
are based on statistics of natural images and on a hy¬ 
pothesis testing decision; we denote these algorithms 
probabilistic local variation (pLV). The best pLV al¬ 
gorithm, which relies on censored estimation, presents 
state-of-the-art results while keeping the same compu¬ 
tational complexity of the LV algorithm. 
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1 Introduction 


Image segmentation is the procedure of partitioning an 
input image into several meaningful pieces or segments, 
each of which should be semantically complete (i.e., an 
item or structure by itself). Oversegmentation is a less 
demanding type of segmentation. The aim is to group 
several pixels in an image into a single unit called a 
superpixel ( Ren and Malik|2QQ3 ) so that it is fully con¬ 
tained within an object; it represents a fragment of a 
conceptually meaningful structure. 

Oversegmentation is an attractive way to compact 
an image into a more succinct representation. Thus, 
it could be used as a preprocessing step to improve 
the performance of algorithms that deal with higher 
level computer vision tasks. For example, superpixels 
have been used for discovering the support of objects 
( Rosenfeld and Weinshall|[2Qll ), extracting 3D geome¬ 
try ( Hoiem et al.|2QQ7' ), multiclass object segmentation 
( Gould et al.|2QQ8 ), scene labeling ( [Farabet et al.|2Ql^, 
objectness measurement in image windows (Alexe et al. 


2012), scene classification (Juneja et al. 2013), floor 


plan reconstruction (Gabral and Furukawa 2014), ob¬ 


ject description (Delaitre et al. 2012), and egocentric 
video summarization ( Lee and Granm^|2015 ). 


One may ask whether specialized oversegmenta¬ 
tion processes are needed at all, given that several ex¬ 
cellent segmentation methods were recently proposed. 
Working in the high recall regime, these algorithms 
could yield excellent oversegmentation accuracy. Unfor¬ 
tunately, however, they are complex and therefore rel¬ 
atively slow. The segmentation approach described in 


Arbelaez et al. (2011), for example, combines the gPb 
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edge detector and the oriented watershed transform to 
give very high precision and recall, but requires over 


240 seconds per frame (as reported in Dollar and Zit- 


nick (2015)). The approach of Ren and Shakhnarovich 


(2013) uses a large number of classifiers and needs 30 
seconds. Moreover, while the recently introduced edge 
detector ( [Dollar and Zitnick|2015[ ) is both accurate and 
very fast (0.08 seconds per frame), it does not pro¬ 
vide close edges and consistent segmentation. Using this 
edge detection for hierarchical multiscale segmentation 
( Arbelaez et al.|[2014 ), indeed achieves state-of-the-art 
results in terms of precision and recall but requires pro¬ 
cessing time of 15 seconds per image. Thus, the time 
required for these methods is often too high for a pre¬ 
processing stage, and then specialized oversegmentation 
algorithms, running in one second or less, are preferred. 

One of the many popular approaches to overseg¬ 
mentation is the mean shift algorithm, which regards 
the image values as a set of random samples and finds 
the peaks of the associated PDF, thereby dividing the 


data into corresponding clusters (Comaniciu and Meer 


2002). The watershed method is a morphological ap¬ 


proach which interprets the gradient magnitude image 
as a topographical map. It finds the catchment basins 
and defines them as segments ( Meyer [1994 ). Turbopix¬ 
els is a level set approach, which evolves a set of curves 
so that they attach to edges in the image and even¬ 


tually define the borders of superpixels (Levinshtein 


et al. 2009). The SLIC superpixel algorithm is based 


on k-means restricted search, and takes into account 


both spatial and color proximity (Achanta et al. 


The entropy rate superpixels (ERS) (Liu et al. 


2012 ). 


2011 ) 


approach partitions the image by optimizing an objec¬ 
tive function that includes the entropy of a random 
walk on a graph representing the image, and a term 
balancing the segments’ sizes. A thorough description 
of common oversegmentation methods can be found in 
Achanta et al. ( 2012| ). 


The local variation (LV) algorithm by Felzenszwalb 


and Huttenlocher (2004) is a widely used, fast and ac¬ 


curate oversegmentation method. It uses a graph rep¬ 
resentation of the image to iteratively perform greedy 
merge decisions by evaluating the evidence for an edge 
between two segments. Although all the decisions are 
greedy and local, the authors showed that, in a sense, 
the final segmentation has desirable global properties. 
The decision criterion is partially heuristic and yet the 
algorithm provides accurate results. Moreover, it is ef¬ 
ficient, with complexity O(nlogn). 

Throughout this paper we use the recall and under¬ 
segmentation error as measures of quality for over seg¬ 
mentation, just as is done in Levinshtein et al. (2009), 


Achanta et al. (2012), and Liu et al. (2011). The re¬ 




Fig. 1 Recall (top) and undersegmentation error (bottom) 
for probabilistic local variation and other common algo¬ 
rithms. 


call ( Martin et al.||2004 ) is the fraction of ground truth 
boundary pixels that are matched by the boundaries 
defined by the algorithm. The undersegmentation er¬ 
ror ( [Levinshtein et aT 2009) quantifies the area of re¬ 
gions added to segments due to incorrect merges. Fig¬ 
ure presents a comparison of the aforementioned 
algorithms, together with probabilistic local variation 
(pLV), the method introduced in this papei[^ 

Understanding the remarkably good performance of 
the greedy LV algorithm was the motivation for our 
work, which makes three contributions: 


1. We briefly analyze the properties of the LV algo¬ 
rithm and show that the combination of all its in¬ 
gredients is essential for high performance. 

2. We examine several probabilistic models and show 
that LV-like algorithms could be derived by statisti¬ 
cal consideration. One variation involves the estima¬ 
tion of the maximum sample drawn from a uniform 
distribution. The second uses natural image statis¬ 
tics and a hypothesis testing decision. This gives 
additional viewpoints, further justifies the original 


^ Code available at 

http: //cis.cs.technion.ac.il/index.php/projects/probabilistic- 
local-variation 
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version of the algorithm, and explains the excel¬ 
lent performance obtained from such a simple, fast 
method. 

3. Following these models, we provide a new algorithm, 
which is similar to LV but relies on statistical argu¬ 
ments. This algorithm meets the best performance 
among the methods in the literature, at least for 
oversegmentation to many segments, but is faster. 

The structure of the paper is as follows. We present 
the LV algorithm in section We study this algorithm 
empirically in sectionIn section]^ we introduce the 
first LV-like algorithm (based on maximum estimation), 
which does not perform as well as the original LV, but 
motivates the alternative and more successful algorithm 
of section]^ (based on natural image statistics). Section 
[^presents our experimental results. A brief discussion 
of single linkage algorithms (which include the LV and 
pLV algorithms) as compared to non-single-linkage, hi¬ 
erarchical algorithms is given in section Finally, sec¬ 
tion [S] concludes. 


2 Local Variation Algorithm 


The LV algorithm is a single-linkage, graph based, hier¬ 
archical clustering process. Let G = (V, E) be a graph 
created from the input image, with vertices on the pix¬ 
els and edges between each pair of neighboring pixels. 
Define a weight function on the edges, re : 5R, 

representing pixel dissimilarity (for example, the RGB 
color distance between pixels). Likewise, let a compo¬ 
nent (segment) Ci be a set of connected pixels. The 
components change throughout the segmentation pro¬ 
cess, and, initially, the set of components {Ci} is the 
set of pixels. 


Following Felzenszwalb and Huttenlocher (2004) 


define the internal dissimilarity of component (7^, 
denoted by Int{Ci), and a threshold function 
called the minimum internal difference and denoted 
MlntiCi^ Cj), as: 

Int{Ci) = max w{e) (1) 

eeMST(Ci) 

MInt{Ci,Cj)= min {Int{Cx)+ T{Cx)), (2) 

xe{i,j} 


where MST{Ci) is a minimum spanning tree of Q, and 
T{Ci) = K/ \Ci\ is a component dependent function, in 
which A is a user controlled parameter and | Ci \ denotes 
the number of vertices in component Ci. 

The LV algorithm for image oversegmentation is 
presented in algorithm Intuitively, we can see that 
two components are merged only if the lightest edge 
that connects them is lighter than the heaviest edge in 
the MST of the components plus a margin. Since the 


Algorithm 1 Local Variation Algorithm 

Input: Weighted graph G = {V, E) with weights w{e), e G E, 
defined by an image. 

Output: Set of components Ci,...,Cn defining seg¬ 
ments 

1: Sort E by non-decreasing edge weight (ei, 62 ,Cm) 

2: Initialize segmentation with each vertex being a com¬ 
ponent 

3: for all <? = 1, ..., m do 

4: Cq = {vi,Vj) ■<— edge with the qth lightest weight 

5: ^ component of containing Vi 

6 : Cj~ ^ component of containing Vj 

7: if (w{eg) < Mint A (cy^ ^ c]-^) 

*hen 

8: u jcyy \ |cy\cy^| 

9: else 

10 : S^ = S^-^ 

11: end if 

12: end for 

13: Postprocessing: Merge all small segments to the neigh¬ 
bor with closest color. 


edges are sorted in step 1, the edges causing merges are 
exactly those that would be selected by Kruskal’s MST 


algorithm (Kruskal 1956). The parameter K controls 


the number of segments in the output segmentation: 
increasing K implies that more edges satisfy the merge 
condition and more merges are performed. 

Oversegmentation algorithms usually include a 
post-processing stage where small segments are re¬ 
moved (line 13 in algorithm [^. We consider a segment 
as small when its size is 10% of the average expected 
segment size, and merge it to its neighbor with the 
smallest color difference. 

Compared to other oversegmentation algorithms, 
LV is among the best in terms of recall and running 
time. It is thus often the method of choice even though 
its undersegmentation error is not as small as that of 


some other algorithms; see Achanta et al. (2012) and 
figure 

The high accuracy obtained by the greedy LV al¬ 
gorithm is impressive. It is mainly due to the adaptive 
threshold Int{Ci) ET{Ci)^ which depends on two com¬ 
ponents: the distribution of weights within the segments 
and their size. The particular combination of these two 
components meets the criterion used to decide whether 


a segmentation is too fine / coarse in Felzenszwalb and 


Huttenlocher (2004), but is not theoretically supported 


otherwise. In this paper we analyze the LV algorithm 
empirically and propose two statistical interpretations 
that lead, eventually, to LV-like algorithms, that fol¬ 
low a statistical decision procedure. One of these new 
algorithms, denoted as probabilistic local variation (or 
more specifically, pLV-ML-Cen), maintains LV’s desir- 
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Fig. 2 The LV method against its reduced versions. No re¬ 
duced version could achieve the performance of the original 
algorithm. 


able properties, and improves its recall and underseg¬ 
mentation error. 


4 Interpreting LV as Maximum Estimation 

4.1 Estimating the Maximum Sample Drawn from a 
Uniform Distribution 


The threshold used in LV has several possible statistical 
interpretations. In this section we consider one interpre¬ 
tation, which suggests that LV’s decision rule is similar 
to maximum value estimation. 

Consider a set of samples drawn from a uni¬ 
form distribution specified by an unknown interval 
[minu^ maxu] • We want to estimate the parameters of 
the uniform distribution, minu^maxu, from the sam- 
plef0 For the special case where minu = 0, let m be the 
sample maximum and S be the set size. Then, the min¬ 
imum variance unbiased estimator for the maximum 
value is given by ( Larsen and Marx]|2Q12 ) 


maxu = m -\- 


m 

's' 


( 3 ) 


3 Empirical Study of the LV Algorithm 


4.2 Interpreting LV as Maximum Estimation 


To analyze the importance of each of the two aforemen¬ 
tioned components in LV, we test reduced versions of 
the algorithm by systematically removing each compo¬ 
nent: 

1. Greedy Merging: By setting MInt{Ci^Cj) = oc 
the algorithm depends neither on the distribution nor 
on the segment sizes. This implies that the segments 
are merged greedily in non-decreasing edge weight. 

2. LV with a Constant Threshold: By setting 
T(Ci) = A, where A is a constant, makes the decision 
size independent (but distribution dependent). Perfor¬ 
mance is substantially reduced and very large segments 
are created. 

3. Area Based Merging: The condition in line 7 of 

algorithm 111 is replaced by min < A, where 

IJ xe{i,j} 

A is a constant. This condition depends only on the 
segment size but not on the distribution, yielding su¬ 
perpixels with roughly the same area. 

4. LV without Removing Small Segments: A 
post-processing step in LV removes small components 
(line 13 in algorithm [^. Without this step, a lot of 
very small, meaningless segments remain, implying that 
many erroneous merges are performed to obtain a pre¬ 
specified number of segments. 

Figure [^compares the recall of LV and its reduced 
versions. All reduced versions yield lower recall than the 
original algorithm. Thus we conclude that both the dis¬ 
tribution and size dependent terms are crucial to LV’s 
performance. 


The estimate (§ seems similar to the threshold expres¬ 
sion used in the local variation algorithm (algorithm 
and Felzenszwalb and Huttenlocher ( 2QQ4[ )). Both ex¬ 
pressions contain two terms. The first is a distribution 
related term, m, which is the maximal observed value. 
The second term in both cases is size dependent. The 
two expressions differ in that, in expresion (|^, the size 
dependent term depends also on the maximal observed 
value, m. Thus, we hypothesize that an LV-like pro¬ 
cess would be obtained by considering the weight values 
in each segment, estimating their maximum under the 
uniform distribution assumption, and testing whether 
a new weight falls below the estimated maximum and 
therefore belongs to the same distribution. If the weight 
satisfies this test for the two segments, then merging 
them is justified. 

Thus the algorithm is exactly like algorithm [^(LV), 
except that MInt{Ci^Cj) is replaced by: 


MInt{Ci,Cj) = min (Int{Cx 
xe{i,j} \ 


Int{Cx 


( 4 ) 


In theory the constant c should be 0. However, due 
to image quantization, the maximum observed value in 
a small segment is often 0, which prevents any further 

^ This is the continuous version of the problem known in 
the statistical theory literature as the German tank problem 
( [Rnggles and Brodie||I947| ), because its solution was used by 
the Allies in WW2 to estimate the number of tanks produced 
by the Germans from the serial numbers of captured tanks. 
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merges. Note also that the difference between two con¬ 
tinuous grey levels may be almost 1, and yet, the dif¬ 
ference between their quantized values is zero. To allow 
for these errors, we set c = 1. Otherwise, this LV-like 
process 0. denoted as LV-MaxEst, is parameter less. 

Figure [^presents an example. Note that the overall 
segmentation still does not seem natural: the segments 
on the bird look natural, but those in the sky do not. 



4.3 A Controllable Segmentation Based on Maximum 
Estimation 

Being parameterless, the decision rule based on the 
maximum estimation does not allow us to control the 
oversegmentation level. A simple, controllable extension 
would be to change MInt{Ci^ Cj) to: 


MInt{Ci,Cj) = min (Int{Cx) + 

V \^x\ 

( 5 ) 

where the constant k controls the number of superpix¬ 
els. We denote this method LV-MaxEst-c (where the 
suffix “c” stands for “controlled”). 

Interpreting LV as a maximum estimation problem 
sheds some light on the algorithm; however, some of 
the assumptions used in this section are not accurate 
and the decision rule in equation ^ actually does not 
perform as well as the original LV algorithm (see sec¬ 
tion |^. This motivates the alternative model presented 
next. 



5 Interpreting LV as Hypothesis Testing 

The statistical interpretation in section is closely re¬ 
lated to the original LV formulation but the assumption 
of a uniform distribution for edge weights seems unjusti- 
hed. In this section, we present an alternative statistical 
model associated with natural image statistics. 


5.1 Natural Image Statistics 


Natural image statistics have been intensively explored 
over the last two decades. Statistical models consider 
specihc image descriptors such as wavelet coefficients 


(De Bonet and Viola 1997) or intensity difference be¬ 


tween adjacent pixels ( Grenander and Srivastava|2QQl ) 

and characterize them statistically. A common way to 
model the behavior of these image descriptors is by 


Fig. 3 A natural image and semilog plot with the edge 
weight histogram of several edge sets. See text for details. 


means of the generalized Gaussian distribution (Mai- 


P(t) — ^ 


(6) 


where, typically, a varies depending on the descriptor 


used and P falls in the range [0.5,0.8] (Srivastava et al. 


2003). This model has been successfully used for, e.g., 
image denoising ( Moulin and Liu||2006 ) and image seg¬ 
mentation ( Heiler and SchnQrr||2005 ). 

We are interested in the weights of the graph edges, 
which are either the absolute differences between LUV 
color vectors or simply intensity differences. The in¬ 
tensity differences are closely related to some wavelet 
coefficient and to gradient strengths, both of which 
were modeled with the generalized Gaussian distribu¬ 
tion ( Mahat||1989 Huang and Mnmford||1999 ). 

The population we consider is somewhat different, 
however. We are interested only in weights that are 
part of the MST and are inside segments (and not be¬ 
tween them). We checked the validity of the exponential 
model, a particular case of the generalized Gaussian dis¬ 
tribution with = 1, on several images and found that 
the exponential assumption is reasonable; see hgurej^ 
which shows one image example and 4 plots of the LUV 
edge weight statistics: 


Weights of ah edges in the image. 

Weights of ah edges within image segments, as 


marked by a human (as given in BSDS300 (Mar- 
tin et al.]|2QQl )). 


Weights of edges in an MST of the full image. 
Weights of edges in an MST of each segment, 
marked by a human. 


as 


As expected, the distributions of the edge weights in 
the MSTs are biased towards lower weights compared 
to the distributions obtained from the edges in the com¬ 
plete image. On the other hand, the edge weight distri¬ 
butions within the segments and throughout the entire 
image are similar. This is also expected because the 
number of edges crossing segments is relatively small. 
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Clearly, all distributions are close to an exponential law 
(observed as a straight line in the semilog plot). In this 
work, we shall therefore assume that the model under¬ 
lying the distribution of edge weights is the exponential 
distribution: 


P{x) = Xe-^^. 


( 7 ) 


5.2 Merge Decisions from Hypothesis Testing 


Consider the local merging context where we need to 
decide whether two segments Si, Sj merge along edge 
e. We propose to make this decision by testing the hy¬ 
pothesis that the weight w{e) of the edge e belongs 
to the distribution Pi{x) = of the weights in 

each one of the segments. If this hypothesis is rejected 
for at least one of the segments, the segments are not 
merged. Otherwise the segments are merged. We refer 
to this method as probabilistic local variation (pLV). 

To test the hypothesis that w{e) belongs to Si, we 
consider the probability: 

poo 

Pi(x>w{e))= / Xie-^^^dx = e-^''^^^'>. ( 8 ) 

J w{e) 

The hypothesis is rejected with a level of significance 5 
whenever Pi{x > w(e)) < S. 

Thus, the probabilistic local variation approach uses 
the following alternative rule for deciding whether two 
segments should be merged: 

1. Let e* be the edge with the minimum weight con¬ 
necting two segments, Si,Sj. 

2. For each segment Sa ^ {Si,Sj}, fit an exponential 
distribution, Pa{x), to the weights in it. 

3. For each segment Sa G {Si, Sj}, test the hypothesis 
that e* belongs to the corresponding distribution 
using the hypothesis test, Pa{x > w{e*)) < 6. 

4. If the hypothesis is rejected in at least one of the 
tests, do not merge. Otherwise merge. 

In the rest of this section we describe several ways 
to estimate the parameters and the implied distribu¬ 
tion. 

A straightforward estimate for A would be the max¬ 
imum likelihood (ML) estimator. Given a population of 
n samples, {xi, X 2 ,drawn i.i.d. from the expo¬ 
nential distribution Q, then the maximum likelihood 
estimator for the parameter, Aml, is 


Aml — 3 — 


E n 


(9) 


When using this estimator, we will denote the proba¬ 
bilistic local variation as pLV-ML. 


The ML estimator is, however, noise prone and 
highly unstable when the sample size is small, as is the 
case when the merged segments are small. One way to 
make a robust decision is by using confidence intervals 
(Cl) for Xml- The symmetric 100(1 — a)% Cl for Xml, 
for a population of n samples drawn from the exponen¬ 
tial distribution, is given by: 


a/2,2n 

2n 


ML, 


A^a/2,2n 

2n 



( 10 ) 


where Xp,u ^ value specifying the tail of weight p in a 

distribution with n degrees of freedom (see section 
7.6 of Ross (2009) for details regarding parameter esti¬ 
mation for the exponential distribution). Xp,u increases 
approximately linearly with v = 2n, and for large n, 
converges to z/ = 2n, implying that the confidence in¬ 
terval decreases with the segment size, as intuitively 
expected. 

The question of which value should be chosen within 
the Cl remains. The effect of choosing a particular A 
value is not straightforward because changes in A in¬ 
fluence other parameters. Suppose we are interested in 
a final segmentation with N superpixels. Choosing A, 
say, at the lower limit of the confidence interval, must be 
compensated for by increasing 5. Otherwise, the thresh¬ 
old will be higher and too many merges will be per¬ 
formed. As explained in section |5.3[ the edge weights 
used for estimating A are biased towards smaller values, 
which makes Xml biased to larger values. Therefore, we 
prefer the lower limit of the Cl (10), and reject a merge 

-^ ^l-a/2 2n 

whenever P{x > w{e)) = ^ < S, or 

equivalently, whenever 


^l-al2,2n \ /..x 

2n j' ^ ^ 

Replacing Xml with the lower limit of the Cl for A 
(method denoted as pLV-ML-CI) indeed yields better 
recall, very similar to that of LV; see figure (top). 
The role of the confidence parameter 5 is analogous to 
that of K in the original LV algorithm: making 5 larger 
results in more segments. 

Note also that qualitatively, the algorithm behaves 
according to the LV principle and gives priority to the 
merging of smaller superpixels. This is not immediately 
clear, however, because, as discussed above, choosing A 
at the lower limit of Cl is compensated for by increasing 

5. This compensation, however, is non-uniform. While 
the level of significance d is uniform and applies to all 
merging decisions, smaller segments are associated with 
larger confidence intervals, which makes the lower limit 


w{e) > In (1/(5) / Xml 
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of their confidence intervals lower on average. There¬ 
fore, the threshold on the edge weight (11) is higher for 
smaller superpixels, which gives them priority to merge. 

While this model achieves nice segmentations and 
sheds some new light on the LV algorithm, we observe 
that the statistical assumption underlying the estima¬ 
tion of A - that the measurements are obtained by i.i.d. 
sampling - seems inaccurate. First, the samples we have 
are sampled in nearby locations, which makes them cor¬ 
related. Moreover, they are sampled with preference for 
lower weight edges. The first problem seems to be mi¬ 
nor because the weights are derivative values, which are 
less correlated than the image intensities. The second 
problem, preferring smaller values, is considered in the 
next subsection. 


come from H = . x, different sets of locations, im- 
plying that the probability of getting a particular par¬ 
tially n-ordered sequence is H times the probability of 
getting the original sequence Y of i.i.d. drawn samples. 
Therefore, the joint probability density of observing the 
values xi,..., Xn is 

f Xn) = H P Xyi)P > Xji^...^XjYi > Xji) 

— H • • • • P{Xji) • P{Xjij^\ > Xji) • • • P{XjYi > Xji) 

/ poo \ m—n 

= H ■P{xi)---P{xn)-i Xe-^^dxj 

= HX^ • • ^-Y'rn-n)xn 


5.3 Parameter Estimation under Biased Sampling 


Contrary to the assumption used in section |5.2[ the 
weights are sampled from the lightest to the heaviest 
and are not drawn i.i.d. Thus, at each step the popu¬ 
lation inside a segment is biased towards low values. In 
what follows we aim to correct this bias. 

Suppose the segment has m + 1 pixels, and at some 
point in time our method of sampling provides us with 
n measurements, which are the lowest n elements in 
the MST of the segment, which contains m edges. Our 
task is to estimate the value of A. The structure of this 
problem is an approximation to the one defined by type 
II censoring^ in which m random variables are drawn 
i.i.d. but only the smallest n < m values are observed 
(an approximation since the true segment may be split 
between several segments specified by the algorithm). 
This type of sampling is common in reliability estima¬ 
tion, where one tries to study the failure rate of some 
process/machine by analyzing only a subset composed 


of the samples which failed first (Epstein and Sobel 


1953). Under these conditions, and following Epstein 


and Sob^ the maximum likelihood for A can be de¬ 
rived as follows: 

A sequence of size m is called partially n-ordered if 
the first n elements in it are non-decreasing (or non¬ 
increasing), and not larger (or not smaller) than any 
of the remaining m — n elements. Consider a sequence, 
T, of m samples drawn i.i.d. from some probability dis¬ 
tribution. A partially n-ordered sequence, X, may be 
generated from it by choosing the n smallest elements 
from V, and making them the first n elements of X, in 
non-decreasing order. The remaining m — n elements in 
X are identical to the remaining m — n elements in Y. 
Their order is the same order as in V. 

Note that many sequences Y may correspond to the 
same partially n-ordered X. The first n elements may 


The value of A yielding the maximum likelihood for this 
function can be found by differentiating ln/(xi, ...,Xn) 
with respect to A and equating to zero, yielding 


Aml — Cen 


n 

Yh=i Xi + {m- n)xn ’ 


(13) 


where the notation ML-Cen is for censored maximum 
likelihood. 

Einally, consider a segment C with edge weights be¬ 
ing {xi, ...,x^, ...,x^}, and the merging decision when 
the edge under study is e. Then, combining equations 
(11) and (13), the one-sided condition for merge rejec¬ 
tion becomes: 


w{e) > HypThr{C) 

_ 2 In (1/(5) Xj + jm- n)xn) 

A^l —Ck:/2,2n 

where m is a user specified parameter reflecting the 
expected size of the true segment. In the case that 
m = n, XML-Cen couvcrgcs to Aml, which we use also 
for n > m. To gain further intuition about this thresh¬ 
old, recall that Xi-a /2 2n g^ows almost linearly with the 
segment size n. 

In summary, our hypothesis testing algorithm for 
image oversegmentation (denoted pLV-ML-Cen) is ex¬ 
actly as algorithmic but the minimum internal differ¬ 
ence between two components is set to: 

MInt{Ci,Cj)= min HypThr{Cx)- (15) 

xe{ij} 

The value of A for a given segment characterizes 
the distribution of its edges. The larger the variabil¬ 
ity within a segment, the smaller the value of A; see 
figure 1C Eigure 1C presents an example segmentation of 
the probabilistic local variation method with correction 


xiec 
n.= \C\ 


(14) 
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Fig. 4 Bottom: visualization of A for each segment. The 
brighter the segment, the larger the value of A. Thus, white 
represents smooth segments and black represents textured 
ones. Top: corresponding oversegmentation. 


for biased sampling. Note that probabilistic local vari¬ 
ation appears to have more segments than the original 
LV. This happens because the LV algorithm produces 
many elongated and hardly visible segments along the 
boundaries. 

Probabilistic local variation with censoring is based 
on different principles and yet follows the same basic 
behavior as the LV algorithm. We consider the decision 


condition (14) and observe the following four proper¬ 
ties: 

1. As discussed above, the denominator Xi-a /2 2n 
grows with the size of the segment, n, while the nu¬ 
merator decreases with n. Thus, the threshold de¬ 
creases with n and gives preference to merges of 
small segments. 

2. The heaviest weight in the MST, appears in the 
numerator of HypThr{C). Thus, heavier Xn leads 
to a higher threshold and to a more likely merge 
decision. 

3. For small segments, where n <C m, the importance 
of the heaviest edge, is amplified by a factor 
linear in {m—n)/n. For larger segments, though, the 
amplification factor is smaller, making the average 
weight of the segment edges more important. 

4. It is straightforward to show that there is a pred¬ 
icate for which probabilistic LV leads to segmen¬ 
tation which is not too fine in the sense specified 


in Felzenszwalb and Huttenlocher (2004). Showing 


that it is not too coarse (Felzenszwalb and Hutten- 



Fig. 5 Segmentations obtained by the methods described in 
the paper (left to right): LV-MaxEst, LV-MaxEst-c, pLV-ML- 
Cen and LV. The last 3 methods are tuned to have roughly 
the same number of segments. 


locher|20M ) seems harder because the sampling 


or¬ 


der is not tightly related to our threshold criterion. 


5.4 Complexity of Hypothesis Testing 


To calculate the threshold (14), we need to keep three 
values for each segment: the sum of its elements, the 
value of the last (heaviest) edge added to it, and its 
number of elements. All these values are updated in 
0(1), and thus the complexity of the hypothesis testing 
method is exactly the same as that of the LV algorithm, 
namely O(nlogn). 


6 Experimental Results 


6.1 Testing Probabilistic Local Variation 


Following common practice, we use the boundary re¬ 
call and the undersegmentation error as quantitative 
performance measures. The recall ( Martin et al.|[2QQ4 ) 
is the fraction of ground truth boundary pixels that are 
matched by the boundaries defined by the algorithm. 
The undersegmentation error ( Levinshtein et ar]|2QQ9 ) 
measures the area of incorrect merges of true segments 


(or parts of them); see Martin et al. (2004); Levin- 


shtein et al.| ( |2009| ) for implementation details. All 


ex¬ 


periments were performed on BSDS300 (test) ( [Martin 
et al.||2001 ) 


The first probabilistic version, LV-MaxEst, is based 
on a uniform distribution model and estimates its inter¬ 
val. It is parameterless and gives a single segmentation 
(see figure [^. It is difficult to compare its average recall 
to that of other methods, because the segmentation of 
different images results in different numbers of segments 
(and different recall). Therefore, we use the generalized 
method, LV-MaxEst-c, for comparison; see section [43] 
The recall curve is shown in figure]^ (top). It is clearly 
inferior to that of LV but in a sense it behaves similarly 
and is better than several of the other methods in the 
literature. 
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Fig. 7 Two images segmented with the probabilistic lo¬ 
cal variation algorithm (censored version) with parameters 
m = 350, d = 0.05. The number of segments depends on the 
characteristics of the image, yielding 268 (left) and 57 (right) 
segments respectively. 



Number of Segments 


Fig. 6 Recall (top) and undersegmentation error (bottom) 
of the LV method against its probabilistic versions proposed 
in this paper. The proposed hypothesis test (with censored es¬ 
timation) performs best for medium to large segment counts. 


obtained. This option was used for generating figure 
Alternatively, m, S can be set by empirically maximiz¬ 
ing the average performance, which depends on the im¬ 
age set. Then, using these parameters for a specific im¬ 
age provides adaptive segmentation, yielding more seg¬ 
ments on “busy” images and fewer segments on smooth 
ones. See an example in figure 

In terms of recall, probabilistic local variation im¬ 
proves LV and matches ERS to achieve the best results 
for large numbers of segments. The undersegmentation 
error is just behind that of ERS, best for this measure; 
see figure [2 The running time of our method is exactly 
the same as LV (0.3 sec on a Pentium 4GB machine), 
which is almost as fast as the fastest method (SLIC) 
and much faster than the only method that achieves 
the same recall (ERS, 2.5 sec). 


The other probabilistic version relies on the expo¬ 
nential distribution model, which is specified by a single 
parameter A. It is important to estimate this parameter 
carefully. The maximum likelihood approach, pLV-ML, 
overestimates A, yielding poor recall; see figure]^ (top). 
Estimating the confidence interval (95%) and using its 
lower limit as in pLV-ML-CI (section [5.2[ ) results in the 
same recall as LV. Just like LV, the method depends 
on a single parameter J, determined by the number of 
segments selected. Observing that the data is biased led 
to using censored estimation. The resulting algorithm 
(pLV-ML-Cen) gives a more accurate estimate of A and 
yields excellent, state-of-the-art, results. 

This algorithm depends on two parameters: m, the 
expected size of the segment, and J, the level of sig¬ 
nificance of the decision (see eq. (El)- The number of 
segments is controlled by any combination of them. Eor 
a prespecified number of segments S (e.g., as needed 
for comparing different methods), we use the average 
segment size m = ImageSize/S^ but found that set¬ 
ting m to any large value (e.g., 200) works equally for 
S G [200, 2000]; 6 is then tuned so that S segments are 


6.2 Multi-Class Segmentation Performance 


Extracting superpixels is not an end in itself, and there¬ 
fore testing oversegmentation in the context of common 
tasks is important. Eollowing the evaluation framework 
presented in |Achanta et al. ( 2012[ ), we examine the qual¬ 
ity of pLV (specifically pLV-ML-Cen) in the context 
of the higher level task of multi-class segmentation. 
We perform our experiments on the MSRC 21-class 


database (Shotton et al. 2009) and use the segmenta¬ 


tion method of Gould et al. (2008), which proceeds as 


follows: the input image is divided into superpixels and 
a set of features is calculated on each superpixel, the 
calculated features are fed to a set of classifiers previ¬ 
ously trained (one for each class), and the labeling is 
selected by minimizing an energy potential on a condi¬ 
tional random field. Table presents the segmentation 
accuracy obtained when selecting different oversegmen¬ 
tation methods. Note that the accuracy achieved when 
using pLV is higher than when using the original LV al¬ 
gorithm. Eurthermore, the accuracy obtained with pLV 
is improved only by the slower ERS algorithm. 
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pLV LV MS TP WS SLIC ERS 

77.0% 74.6% 70.3% 76.0% 74.7% 76.9% 78.0% 

Table 1 Multiclass segmentation accuracy when using the 
following methods for the oversegmentation stage: pLV, LV, 
mean shift (MS), turbopixels (TP), watershed (WS), SLIC 
and ERS. 


7 Discussion 


7.1 Single-Linkage Clustering 


The LV algorithm, as well as its probabilistic versions 
considered here, are hierarchical algorithms, which, at 
every step, examine and possibly merge the two most 
similar segments. 

Being single-linkage algorithms, they specify the 

dissimilarity between two segments, X and V, as the 

minimal distance between their two closest elements: 

D{X,Y) = min d{x^y), where d{x,y) is some un- 
xex,yeY 

derlying dissimilarity function between elements. Re¬ 
call that in the graph based notation the dissimilarities 
between elements are represented by the correspond¬ 
ing edge weights and therefore D(X, Y) is the minimal 
weight of an edge from V to V. In the LV and pLV case, 
the decision whether to merge two segments is made by 
testing, independently for each segment, whether the 
weight D{X,Y) belongs to the distribution of weights 
estimated for the segment. 


This paper focuses on the LV algorithm (Felzen- 


szwalb and Huttenlocher 2004) and on the implied 


single-edge based merging decision. Alternatively, the 
decision as to whether two segments should be merged 
can be made by testing whether all their weights may 
be explained by a single, common distribution. For 
one-dimensional distributions characterizing a segment, 
classical tests such as the Kolmogorov-Smirnov test 
( Ross||2009 ) may be used; see, e.g 


(2000). See also Peng et al. (2011) for another use 


Pauwels and Fred- 


of one-dimensional test for segmentation. Usually, how¬ 
ever, more effective, multidimensional distributions de¬ 
scribing, say, the segment’s texture or color, are pre¬ 
ferred. Formal tests for multidimensional distributions 
are problematic (see, however, Glazer et al. ( 2012| )). 
Usually the distribution is approximated as a mixture 
of visual words (textons) and the distance between 
the histograms serves as a measure of dissimilarity be¬ 
tween segments. A more reliable dissimilarity measure 
can be obtained by augmenting this distance with edge 
information ( Martin et al.||2QQ4 ). 

In the context of hierarchical segmentation, the dis¬ 
tribution comparison technique may be used in two dif¬ 
ferent ways. One approach carries out the merging pro¬ 
cess according to a fixed order determined before the 


process begins, as done in single-linkage processes. This 
option is inconsistent with the hierarchical approach be¬ 
cause the most similar pairs of segments (according to 
the merging criterion) are not tested before the others. 
The other option, to recalculate the order dynamically 
after every merge, is computationally expensive. 

An effective combination studied in the literature is 
to divide the hierarchical merging process into stages. 
At the beginning of a stage, every segment is speci¬ 
fied as an element, and the dissimilarity between the 
elements is calculated according to an arbitrary dissim¬ 
ilarity measure, which may depend on the distributions. 
Then all the merges in this stage proceed according to 
a single-linkage algorithm. This approach has shown to 
be a good trade-off between runtime and accuracy, as 


presented in Kim et al. (2011); Ren and Shakhnarovich 
(|2Ml;|Baltaxe| (120141). 


8 Conclusion 


The local variation algorithm of |Felzenszwalb and Hut 
tenlocher (2004) is a simple yet amazingly effective 


oversegmentation method. In this paper we analyzed 
the LV algorithm using statistical and empirical meth¬ 
ods and showed that the algorithm and its performance 
may be explained by statistical principles. 

We proposed an oversegmentation algorithm, de¬ 
noted probabilistic local variation, that is based on hy¬ 
pothesis testing and on the statistical properties of nat¬ 
ural images. 

We found that probabilistic local variation is highly 
accurate, outperforms almost all other oversegmen¬ 
tation methods, and runs much faster than the one 
equally accurate oversegmentation algorithm (ERS). 
This is remarkable because it follows from a statisti¬ 
cal interpretation of a 10-year old method (LV), which 
is, by the way, still one of the best competitors. 
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